Goto

Collaborating Authors

 safety issue


The Only Thing Standing Between Humanity and AI Apocalypse Is … Claude?

WIRED

The Only Thing Standing Between Humanity and AI Apocalypse Is Claude? As AI systems grow more powerful, Anthropic's resident philosopher says the startup is betting Claude itself can learn the wisdom needed to avoid disaster. Anthropic is locked in a paradox: Among the top AI companies, it's the most obsessed with safety and leads the pack in researching how models can go wrong. But even though the safety issues it has identified are far from resolved, Anthropic is pushing just as aggressively as its rivals toward the next, potentially more dangerous, level of artificial intelligence. Its core mission is figuring out how to resolve that contradiction. Last month, Anthropic released two documents that both acknowledged the risks associated with the path it's on and hinted at a route it could take to escape the paradox.


Health Department Will Mine Unverified Vaccine Injury Claims With New AI Tool

Mother Jones

Experts worry it will be used to further Robert F. Kennedy Jr.'s anti-vaccine agenda. Get your news from a source that's not owned and controlled by oligarchs. The US Department of Health and Human Services (HHS) is developing a generative artificial intelligence tool to find patterns across data reported to a national vaccine monitoring database and to generate hypotheses on the negative effects of vaccines, according to an inventory released last week of all use cases the agency had for AI in 2025. The tool has not yet been deployed, according to the HHS document, and an AI inventory report from the previous year shows that it has been in development since late 2023. But experts worry that the predictions it generates could be used by HHS secretary Robert F. Kennedy Jr. to further his anti-vaccine agenda.


HHS Is Making an AI Tool to Create Hypotheses About Vaccine Injury Claims

WIRED

Experts worry Robert F. Kennedy Jr.'s Health Department will use an internal AI tool to analyze vaccine injury claims in a way that furthers his anti-vaccine agenda. The US Department of Health and Human Services is developing a generative artificial intelligence tool to find patterns across data reported to a national vaccine monitoring database and to generate hypotheses on the negative effects of vaccines, according to an inventory released last week of all use cases the agency had for AI in 2025. The tool has not yet been deployed, according to the HHS document, and an AI inventory report from the previous year shows that it has been in development since late 2023. But experts worry that the predictions it generates could be used by Health and Human Services secretary Robert F. Kennedy Jr. to further his anti-vaccine agenda. A long-standing vaccine critic, Kenedy has upended the childhood vaccination schedule in his year in office, removing several shots from a list of recommended immunizations for all children, including those for Covid-19, influenza, hepatitis A and B, meningococcal disease, rotavirus, and respiratory syncytial virus, or RSV.


Large language models provide unsafe answers to patient-posed medical questions

Draelos, Rachel L., Afreen, Samina, Blasko, Barbara, Brazile, Tiffany L., Chase, Natasha, Desai, Dimple Patel, Evert, Jessica, Gardner, Heather L., Herrmann, Lauren, House, Aswathy Vaikom, Kass, Stephanie, Kavan, Marianne, Khemani, Kirshma, Koire, Amanda, McDonald, Lauren M., Rabeeah, Zahraa, Shah, Amy

arXiv.org Artificial Intelligence

Millions of patients are already using large language model (LLM) chatbots for medical advice on a regular basis, raising patient safety concerns. This physician-led red-teaming study compares the safety of four publicly available chatbots--Claude by Anthropic, Gemini by Google, GPT-4o by OpenAI, and Llama3-70B by Meta--on a new dataset, HealthAdvice, using an evaluation framework that enables quantitative and qualitative analysis. In total, 888 chatbot responses are evaluated for 222 patient-posed advice-seeking medical questions on primary care topics spanning internal medicine, women's health, and pediatrics. We find statistically significant differences between chatbots. The rate of problematic responses varies from 21.6 percent (Claude) to 43.2 percent (Llama), with unsafe responses varying from 5 percent (Claude) to 13 percent (GPT-4o, Llama). Qualitative results reveal chatbot responses with the potential to lead to serious patient harm. This study suggests that millions of patients could be receiving unsafe medical advice from publicly available chatbots, and further work is needed to improve the clinical safety of these powerful tools.


LongSafety: Evaluating Long-Context Safety of Large Language Models

Lu, Yida, Cheng, Jiale, Zhang, Zhexin, Cui, Shiyao, Wang, Cunxiang, Gu, Xiaotao, Dong, Yuxiao, Tang, Jie, Wang, Hongning, Huang, Minlie

arXiv.org Artificial Intelligence

As Large Language Models (LLMs) continue to advance in understanding and generating long sequences, new safety concerns have been introduced through the long context. However, the safety of LLMs in long-context tasks remains under-explored, leaving a significant gap in both evaluation and improvement of their safety. To address this, we introduce LongSafety, the first comprehensive benchmark specifically designed to evaluate LLM safety in open-ended long-context tasks. LongSafety encompasses 7 categories of safety issues and 6 user-oriented long-context tasks, with a total of 1,543 test cases, averaging 5,424 words per context. Our evaluation towards 16 representative LLMs reveals significant safety vulnerabilities, with most models achieving safety rates below 55%. Our findings also indicate that strong safety performance in short-context scenarios does not necessarily correlate with safety in long-context tasks, emphasizing the unique challenges and urgency of improving long-context safety. Moreover, through extensive analysis, we identify challenging safety issues and task types for long-context models. Furthermore, we find that relevant context and extended input sequences can exacerbate safety risks in long-context scenarios, highlighting the critical need for ongoing attention to long-context safety challenges. Our code and data are available at https://github.com/thu-coai/LongSafety.


CFSafety: Comprehensive Fine-grained Safety Assessment for LLMs

Liu, Zhihao, Hu, Chenhui

arXiv.org Artificial Intelligence

As large language models (LLMs) rapidly evolve, they bring significant conveniences to our work and daily lives, but also introduce considerable safety risks. These models can generate texts with social biases or unethical content, and under specific adversarial instructions, may even incite illegal activities. Therefore, rigorous safety assessments of LLMs are crucial. In this work, we introduce a safety assessment benchmark, CFSafety, which integrates 5 classic safety scenarios and 5 types of instruction attacks, totaling 10 categories of safety questions, to form a test set with 25k prompts. This test set was used to evaluate the natural language generation (NLG) capabilities of LLMs, employing a combination of simple moral judgment and a 1-5 safety rating scale for scoring. Using this benchmark, we tested eight popular LLMs, including the GPT series. The results indicate that while GPT-4 demonstrated superior safety performance, the safety effectiveness of LLMs, including this model, still requires improvement. The data and code associated with this study are available on GitHub.


ChineseSafe: A Chinese Benchmark for Evaluating Safety in Large Language Models

Zhang, Hengxiang, Gao, Hongfu, Hu, Qiang, Chen, Guanhua, Yang, Lili, Jing, Bingyi, Wei, Hongxin, Wang, Bing, Bai, Haifeng, Yang, Lei

arXiv.org Artificial Intelligence

With the rapid development of Large language models (LLMs), understanding the capabilities of LLMs in identifying unsafe content has become increasingly important. While previous works have introduced several benchmarks to evaluate the safety risk of LLMs, the community still has a limited understanding of current LLMs' capability to recognize illegal and unsafe content in Chinese contexts. In this work, we present a Chinese safety benchmark (ChineseSafe) to facilitate research on the content safety of large language models. To align with the regulations for Chinese Internet content moderation, our ChineseSafe contains 205,034 examples across 4 classes and 10 sub-classes of safety issues. For Chinese contexts, we add several special types of illegal content: political sensitivity, pornography, and variant/homophonic words. Moreover, we employ two methods to evaluate the legal risks of popular LLMs, including open-sourced models and APIs. The results reveal that many LLMs exhibit vulnerability to certain types of safety issues, leading to legal risks in China. Our work provides a guideline for developers and researchers to facilitate the safety of LLMs.


Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models

Wang, Fei, Mehrabi, Ninareh, Goyal, Palash, Gupta, Rahul, Chang, Kai-Wei, Galstyan, Aram

arXiv.org Artificial Intelligence

Data is a crucial element in large language model (LLM) alignment. Recent studies have explored using LLMs for efficient data collection. However, LLM-generated data often suffers from quality issues, with underrepresented or absent aspects and low-quality datapoints. To address these problems, we propose Data Advisor, an enhanced LLM-based method for generating data that takes into account the characteristics of the desired dataset. Starting from a set of pre-defined principles in hand, Data Advisor monitors the status of the generated data, identifies weaknesses in the current dataset, and advises the next iteration of data generation accordingly. Data Advisor can be easily integrated into existing data generation methods to enhance data quality and coverage. Experiments on safety alignment of three representative LLMs (i.e., Mistral, Llama2, and Falcon) demonstrate the effectiveness of Data Advisor in enhancing model safety against various fine-grained safety issues without sacrificing model utility.


Safety challenges of AI in medicine

Wang, Xiaoye, Zhang, Nicole Xi, He, Hongyu, Nguyen, Trang, Yu, Kun-Hsing, Deng, Hao, Brandt, Cynthia, Bitterman, Danielle S., Pan, Ling, Cheng, Ching-Yu, Zou, James, Liu, Dianbo

arXiv.org Artificial Intelligence

Recent advancements in artificial intelligence (AI), particularly in deep learning and large language models (LLMs), have accelerated their integration into medicine. However, these developments have also raised public concerns about the safe application of AI. In healthcare, these concerns are especially pertinent, as the ethical and secure deployment of AI is crucial for protecting patient health and privacy. This review examines potential risks in AI practices that may compromise safety in medicine, including reduced performance across diverse populations, inconsistent operational stability, the need for high-quality data for effective model tuning, and the risk of data breaches during model development and deployment. For medical practitioners, patients, and researchers, LLMs provide a convenient way to interact with AI and data through language. However, their emergence has also amplified safety concerns, particularly due to issues like hallucination. Second part of this article explores safety issues specific to LLMs in medical contexts, including limitations in processing complex logic, challenges in aligning AI objectives with human values, the illusion of understanding, and concerns about diversity. Thoughtful development of safe AI could accelerate its adoption in real-world medical settings.


Tesla's Cybertruck disaster: Insider reveals 'serious safety issues' behind scenes of EV rollout - as drone footage shows hundreds of unfinished trucks backed up at Texas factory

Daily Mail - Science & tech

Customer reports that Tesla has halted deliveries for its futuristic Cybertruck amid allegedly dangerous safety issues with its gas pedal come as no surprise to one former insider. 'After I left, it got worse,' said Balan, who is suing her former boss Elon Musk's electric car company for defamation. 'I have quite a few people that are right now in Tesla,' Balan said. 'They brought some serious safety issues to my attention.' New Cybertruck owners have described its gas pedal as a'deathtrap,' demonstrating how the pedal cover can slide off the accelerator and become snagged on the carpet, locking it in place and spurring the car to accelerate at top speed.